Over the span of this course, we have mainly covered two packages for the purpose of data visulization: ggplot and ggvis. Although both packages are extremely useful for the purpose of data visualization, each comes with its own pros and cons. While ggplot is a nice, easy-to-use package for basic data visualization, its lack of interactivity can be somewhat limiting for both developers and users. Conversely, ggvis is a package that allows for much more complex visualizations since it implements interactive charts and graphs although, in my opinion, at the expense of “user-friendliness.” The purpose of this post is to introduce a somewhat middle ground in the form of plotly, an R packcage that creates interactive data visualizations in a syntax that is relatively similar to that of ggplot.
Created in 2012, Plotly is an open source visualization library specializing in interactive graphing and headquartered in Montreal, Canada.1 As a result of its construction using the Python scripting language, the package is very much language agnostic, meaning that instead of being saved as a traditional .png file, all charts are stored as JSON files which can be read by a wide variety of languages such as JavaScript and R. Because of this nontraditional storage form, all graphs and charts created using plotly are embeddable and editable on the web, so unlike ggvis, interactive graphs can be run on knit html files.2 Below are a few examples to familiarize yourself with plotly syntax and compare its usability with that of ggplot and ggvis.
#Packages needed for Examples.
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(ggplot2)
library(ggvis)
##
## Attaching package: 'ggvis'
## The following objects are masked from 'package:plotly':
##
## add_data, hide_legend
## The following object is masked from 'package:ggplot2':
##
## resolution
Plotly constructs its graphs using two main functions: plot_ly and ggplotly. The first, plot_ly, is a visualization function like we’ve seen before that takes in a data frame, x and/or y coordinate mappings, type of visualization, and various other optional arguments to customize your graph.3
Goal: Create a graph that shows distribution of car weights using the built-in mtcars data set’s wt column.
#Graph using 'ggplot2'.
ggplot(data = mtcars,aes(x = wt))+geom_histogram(binwidth = .6)
#Graph using 'ggvis'.
ggvis(data=mtcars,x=~wt)%>%
layer_histograms(width = .6)
#Graphs using 'plot_ly'.
plot_ly(data=mtcars,x=~wt,type = "histogram",xbins=list(start=1.5,end=6,size=.6))
As you can see, plot_ly eliminates the need for secondary functions like geom_histogram or layer_histograms by declaring its type and format within the function itself. Also, as in the example above, all variable attributes, such as those declared in xbins, must be declared in the form of a list.4 Additionaly, functions like layout can be used with plot_ly for greater graphic control.
plot_ly(data=mtcars,x=~wt,type = "histogram",xbins=list(start=1.5,end=6,size=.6))%>%
layout(xaxis=list(title = 'Car Weights',showgrid=TRUE,nticks=10),
yaxis=list(title = 'Frequency', showgrid=TRUE,nticks=24))
Notice that all graph accurately depict the given data in accordance with the desire binwidth of .6. However, as you can see if you alter the binwidth of the given plotly graph, failure to alter the starting and endpoints as well will result in a slightly different graph. In order to circumvent this problem, plotly contains a second visualization function ggplotly which takes in any ggplot chart as an argument and displays it in an interactive format.
#Assign ggplot graph to 'ggplotly'.
plot <- ggplot(data = mtcars,aes(x = wt))+geom_histogram(binwidth = .6)
#Call ggplotly on plot object.
ggplotly(plot)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
We can then manipulate the newly created plotly object just as before
ggplotly(plot)%>%
layout(xaxis=list(title = 'Car Weights',showgrid=TRUE),
yaxis=list(title = 'Frequency', showgrid=TRUE))
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
For the most part, plotly is a very simple yet powerful package. Now, for a more complex example:
Goal: Show visualization of number of passengers per month for every year using built-in AirPassengers data.
For this example, we will be using a scatterplot to show number of passengers per month, setting the months as the x-axis and number of people as the y-axis.
First, since AirPassengers is not in the data frame form that we want, we need to clean the data. Create a new data frame passngers with the first column being the vector of months and each subsequent column the vector of passengers by month for every year.
#For appearances abbreviate month names.
month.name <- substr(month.name,1,3)
passengers=data.frame(Month=month.name)
n=1
for (year in 1949:1960){
passengers[as.character(year)]=AirPassengers[n:(n+11)]
n=n+12
}
head(passengers)
## Month 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
## 1 Jan 112 115 145 171 196 204 242 284 315 340 360 417
## 2 Feb 118 126 150 180 196 188 233 277 301 318 342 391
## 3 Mar 132 141 178 193 236 235 267 317 356 362 406 419
## 4 Apr 129 135 163 181 235 227 269 313 348 348 396 461
## 5 May 121 125 172 183 229 234 270 318 355 363 420 472
## 6 Jun 135 149 178 218 243 264 315 374 422 435 472 535
The first method we will consider involves the use of ggplot.
ggplot(passengers,aes(Month,`1949`,color=Month))+geom_point()
ggplot(passengers,aes(Month,`1950`,color=Month))+geom_point()
Since this package is not interactive, the only possible way to visualize the number of passengers by month for each year is to create a new chart for each subsequent year, which is not very efficient.
The second method we will consider is using the ggvis library to create an interactive scatterplot that changes its points according to the chosen year.
#slider that returns column name of passengers according to desired year.
slider <- input_slider(1949,1960,1950, map=function(x) {as.name(names(passengers[x-1947]))})
ggvis(passengers,x = ~Month,
y=slider) %>%
layer_points(fill=~Month)
## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.
To use plotly, as you will later see, it will be much easier first reorganize the data frame in only 3 columns: Number(the number of passengers), Month, and Year. (Although it may seem counterproductive to reorganize the data at first, realize that the the data analysis cycle is not linear and “re-cleaning” data to suit the developer’s needs is common.)
passengers = data.frame(Number=AirPassengers)
passengers['Month']=month.name
a <- rep(1949,12)
for (year in 1950:1960){
a=c(a,rep(year,12))}
passengers['Year']=a
head(passengers)
## Number Month Year
## 1 112 Jan 1949
## 2 118 Feb 1949
## 3 132 Mar 1949
## 4 129 Apr 1949
## 5 121 May 1949
## 6 135 Jun 1949
Once we have “re-cleaned” the data, we are ready to graph it using plotly. In this case, it will be easier to construct our chart using ggplot first. When using ggplotly, a slider will automatically be created according to the values dictated in the aesthetic element ‘frame’ of our initial plot, p. Statically, the graph will show all possible points of the data frame but, interactively (when we run p through ggplotly), it will switch from year to year.
plot <- ggplot(passengers, aes(x=Month, y=Number,color=Month)) +
geom_point(aes(frame=Year))
## Warning: Ignoring unknown aesthetics: frame
plot
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
ggplotly(plot)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
Note: Although in this case it is more efficient to use plotly, there are still drawbacks. For example, if we were to construct the chart using plot_ly we would need to define all parameters for the slider ourselves which is a somewhat complex process you can read more about here.6
In conclusion, plotly is a very useful and powerful package which allows not only for the creation of interactive plots but also the conversion of static charts into an interactive format.Its ability to run on multiple platforms allows for greater collaboration across multiple programming languages as well.While I have covered some of the more basic functions of plotly, there are many more features that exist from basic barcharts to complex interactive maps and 3D graphs which will be useful for more advanced data analysis.7
1:https://en.wikipedia.org/wiki/Plotly
2:https://plot.ly/d3-js-for-r-and-shiny-charts/
3:https://images.plot.ly/plotly-documentation/images/r_cheat_sheet.pdf
4:https://plot.ly/r/#basic-charts
5:https://plot.ly/r/reference/
7:https://moderndata.plot.ly/interactive-r-visualizations-with-d3-ggplot2-rstudio/